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Abstract — Acquiring current, accurate land-use information is 
critical for monitoring and understanding the impact of anthropogenic 
activities on natural environments. Remote sensing technologies are of 
increasing importance because of their capability to acquire informa- 
tion for large areas in a timely manner, enabling decision makers to be 
more effective in complex environments. Although optical imagery has 
demonstrated to be successful for land cover classification, active 
sensors, such as light detection and ranging (LiDAR), have distinct 
capabilities that can be exploited to improve classification results. 
However, utilization of LiDAR data for land cover classification has 
not been fully exploited. Moreover, spatial-spectral classification has 
recently gained significant attention since classification accuracy can 
be improved by extracting additional information from the neighbor- 
ing pixels. Although spatial information has been widely used for 
spectral data, less attention has been given to LiDAR data. In this work, 
a new framework for land cover classification using discrete return 
LiDAR data is proposed. Pseudo-waveforms are generated from the 
LiDAR data and processed by hierarchical segmentation. Spatial 
features are extracted in a region-based way using a new unsupervised 
strategy for multiple pruning of the segmentation hierarchy. The 
proposed framework is validated experimentally on a real dataset 
acquired in an urban area. Better classification results are exhibited by 
the proposed framework compared to the cases in which basic LiDAR 
products such as digital surface model and intensity image are used. 
Moreover, the proposed region-based feature extraction strategy 
results in improved classification accuracies in comparison with a 
more traditional window-based approach. 

Index Terms — Classification, hierarchical segmentation (HSeg), 
light detection and ranging (LiDAR), pseudo-waveform, support 
vector machine (SVM). 

I. Introduction 

A CQUIRING current, accurate land-use information is 
critical for monitoring and understanding the impact of 
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anthropogenic activities on natural environments. It is also a 
necessary input for planning the infrastructure and managing the 
rapid change associated with the recent global urbanization. 
Remote sensing technologies are of increasing importance be- 
cause of their capability to acquire information on large areas in a 
timely manner, enabling decision makers to be more effective in 
complex environments. Traditionally, optical remote sensing 
technologies, including high spatial resolution multi-spectral 
imagery [1], [2], have been the first choice for developing land 
cover maps due to the availability of tools for data processing. 
More recently, hyperspectral remote sensing has gained attention 
in the remote sensing application community, including the 
urban areas where signatures of different materials are difficult 
to discriminate [3], [4]. 

Although optical imagery has proved to be successful for land 
cover classification, only data acquired daytime and with cloud- 
free conditions can be effectively used for such purpose. In 
contrast, active sensors such as synthetic aperture radar (SAR) 
and light detection and ranging (LiDAR) operate in both day / 
night conditions and utilize timing information of the outgoing 
pulse and incoming reflected energy to provide vertical informa- 
tion. Operating in the microwave portion of the spectrum, SAR is 
also an all-weather sensing technology and has capability to 
penetrate vegetation, while the small footprint and pulse rate of 
LiDAR allow it to penetrate gaps in canopy. Recently, the use of 
multi-sensor data for performing land cover classification [5]— [7] 
has gained significant attention in the remote sensing community 
due to the capability to exploit complementary information — 
spectral information from the optical sensors and structural 
information from the active sensors — to improve the overall 
classification accuracy of the land cover map. 

LiDAR has recently been demonstrated to be effective for a high 
resolution 3-D mapping [8]— [10], but LiDAR data have not been 
fully exploited for land cover classification. Most of the classifi- 
cation studies that have utilized LiDAR data have focused on the 
discrimination or characterization of vegetation [5], [6], [1 1] and 
have been primarily based on data products such as the digital 
surface model (DSM), digital elevation model (DEM), intensity, 
and relative heights (RHs) [6], [12], [13]. A better representation of 
the vertical structure can be obtained by aggregating discrete 
elevation and intensity measurements into larger footprints to 
create pseudo-waveforms [14]. Also pseudo- waveforms have 
been investigated for assessing the forest structure, while they 
have not been leveraged in the land cover classification problems. 

Most of the traditional classification methods applied to 
spectral reflectance data perform pixel-level classification, 
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assuming spatial independence [15]. Similar approaches can 
also be considered in a multi-sensor environment involving 
LiDAR data. Alternative strategies have been recently pro- 
posed for spatial-spectral classification [16], [17]. The ECHO 
classifier [18] is probably the first method that has combined 
spatial and spectral information. Later, Markov random field 
(MRF) models have been extensively investigated [19]. Other 
approaches include textural features [20], morphological filters 
[1], and segmentation algorithms [21]. The common denomi- 
nator of spatial-spectral classification strategies is to include 
information from the neighboring pixels. However, defining an 
appropriate neighborhood system is nontrivial. The simplest 
solution involves a neighborhood of fixed order, such as a 
window [18], from which spatial features can be easily ex- 
tracted [22]. Although this approach has yielded good results in 
general, it suffers from the so-called “border effect” problem 
where the neighborhood includes pixels from multiple objects. 
The problem can be mitigated by defining the neighborhood 
system in an adaptive way. A first solution involves morpho- 
logical filters [1], in which spatial features can potentially be 
extracted for each pixel from a different neighborhood system 

[23] . Another approach is based on segmentation schemes 
which subdivide an image into nonoverlapping regions by 
adopting homogeneity criteria to assign similar pixels to groups 

[24] . Each region in the segmentation map defines a neighbor- 
hood for all the pixels within this region, from which spatial 
features can then be extracted [25]. Particular attention has 
recently been given to multi-level approaches. Images are 
composed of objects with different shapes and sizes, so it is 
possible to identify objects that are specific to a given level of 
detail and not to others. Considering a neighborhood system 
with a fixed shape, traditional multi-level methods use a pyra- 
mid structure [26] or a set of concentric windows of increasing 
size [27], [28]. Similarly, in the case of adaptive neighborhood 
systems, different levels of detail can be obtained by consider- 
ing different segmentation results [29]. 

The same concepts that have been utilized for spectral data 
could potentially be used in the context of classification of the 
LiDAR data. Most research involving LiDAR data has focused 
on vegetation classification [1 1], [30] or building detection [31], 
[32], whereas little attention has been given to the problem of 
land cover classification. 

One of the state-of-the-art segmentation methods is represented 
by the hierarchical segmentation (HSeg) algorithm [33]. HSeg is 
one of the few available segmentation approaches that naturally 
integrate spatial and spectral information. It involves a combina- 
tion of region object finding by hierarchical step-wise optimiza- 
tion (HSWO, or iterative best merge region growing) [34] and 
region clustering by grouping spectrally similar but spatially 
disjoint regions. HSeg reports its output with a segmentation 
hierarchy, defined as a set of image segmentations at different 
levels of detail, in which segmentation at coarser levels of detail 
are produced by merging regions at finer levels of detail. It is often 
necessary to select a single optimum segmentation level from this 
hierarchy, or in general terms multiple optimum levels need to be 
selected if a multi-level approach is considered. However, select- 
ing the best level(s) is a nontrivial problem, and, in general, it 
depends on the application. The simplest approach consists of 
selecting the appropriate level of segmentation interactively with 
the software HSegViewer [35], as done in [36]. However, an 


automatic procedure would be desirable. Few approaches that are 
based on the HSeg algorithm have been proposed. In a preliminary 
study [37], joint spectral-spatial homogeneity scores are used to 
select the best level(s) in an automatic and unsupervised way. A 
different method has been proposed in [38], in which automati- 
cally selected markers are used to improve the region merging 
process. The method converges naturally to a final segmentation 
map, which can be considered as the best segmentation level of the 
hierarchy. The strategy is supervised, i.e., a set of training samples 
is required to generate the markers. 

A drawback of HSeg is the assumption that the best segmenta- 
tion level(s) corresponds to one (some) actual level(s) of the 
hierarchy. However, it is possible that this assumption is not 
justified, since different objects could be well segmented at 
different levels of the hierarchy. This issue has been addressed 
in the literature by considering binary partition tree (BPT) [39], 
which represents a particular segmentation method in which the 
pair of most similar neighboring regions is merged at each 
iteration. In this context, the process that constructs the final 
segmentation by selecting regions at different levels of the hierar- 
chy is usually referred to as pruning. The objective of the pruning 
strategy is to remove subtrees of the hierarchy that are homoge- 
neous with respect to a defined homogeneity criterion. Pruning 
strategies have recently been proposed in the remote sensing 
literature, based on supervised [40] and unsupervised [7] criteria. 
However, these strategies have been specifically developed for 
BPT and, therefore, cannot be applied in the HSeg context. 

The objective of this paper is to propose a new framework for 
land cover classification using discrete return LiDAR data as a 
preliminary step before fully exploiting synergistic effects 
between the hyperspectral and LiDAR data. In particular, 
pseudo- waveforms are generated from the LiDAR data and 
used in the successive steps of the framework. The HSeg 
algorithm is used to process the pseudo-waveforms and subdi- 
vide the image into homogeneous regions. Multiple levels are 
extracted from the segmentation hierarchy by a new unsuper- 
vised pruning strategy. At this point, spatial features (e.g., 
textural) are extracted from the regions following a region- 
based approach. Finally, the extracted features are combined 
with the original pseudo -wave forms and classified using a 
support vector machine (SVM). The proposed framework is 
validated experimentally on a real dataset acquired in an urban 
area at the University of Houston, Houston, TX, USA. The 
results are compared to those obtained by considering basic 
LIDAR products, such as DSM and intensity. Moreover, 
an analysis in conjunction with hyperspectral data is conducted 
to demonstrate complementary capability of hyperspectral and 
LiDAR data for performing land cover classification. 

The paper is organized as follows. In Section II, the proposed 
framework for land cover classification based on pseudo- 
waveforms and HSeg is described. Section III presents the 
dataset used in the experimental analysis and the correspond- 
ing results. Conclusion and future works are presented in 
Section IV. 

II. Proposed Method 

We propose a new framework for land cover classification 
based on pseudo-waveforms generated from discrete return 
LiDAR data and object-based spatial analysis. The flowchart 
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Fig. 1. Flowchart of the proposed framework. 


of the proposed framework is shown in Fig. 1. It is composed of 
five steps: 1) generation of pseudo-waveforms from discrete 
return LiDAR data; 2) processing of pseudo-waveforms with 
the HSeg algorithm to subdivide the data into spatially homoge- 
neous regions; 3) extraction of multiple levels from the segmen- 
tation hierarchy by a new unsupervised pruning strategy; 

4) extraction of spatial features from the regions by following 
a region-based approach; and 5) combining the extracted spatial 
features with the original pseudo- waveforms to generate final 
classification results. It is important to note that the processes of 
pseudo- waveform generation and object-based feature extrac- 
tion are completely unsupervised, i.e., they are based only on the 
information contained in the acquired LiDAR data and do not 
require labeled training samples. The training samples are used 
only in the last step of the framework to construct the classifica- 
tion model and obtain the final classification map. The following 
sections detail the methodology. 

A. Pseudo-Waveform Generation 

The pseudo-waveforms from LiDAR point cloud data are 
generated using the following method: 


Algorithm 1: Pseudo-waveform generation 


Input: LiDAR point cloud data, parameters d x , d y , and d z 
Output: pseudo-waveform data 

1) Classify LiDAR point cloud data into ground and non- 
ground points. 

2) Generate digital terrain model (DTM) with spatial resolu- 
tion ( d x , d y ) using only ground points and the natural neighbor 
(NN) interpolation algorithm. 

3) Assign LiDAR points to corresponding pixels based on the 
x and y coordinates of the point. 

4) Extract ground elevation of the points from the DTM. 

5) Subtract ground elevation values from the elevation of the 
point (elevation above sea level) and the new z value (z ag ) now 
corresponds to the elevation above ground unit. 


6) For a given pixel, define 3-D voxels with the same 
horizontal dimension as the DTM grid structure (d x , d y ) 
and vertical dimension ( d z ), then stack them for pseudo-wave- 
form generation. These voxels represent the LiDAR signal 
response as a function of elevation, and the LiDAR points 
are assigned to corresponding voxels based on the z ag value of 
the points. 

7) Generate the pseudo- waveform from this voxel structure by 
summing intensity values of all points within each voxel. 

8) Normalize the pseudo- waveform data by dividing the 
values stored in the voxels by the number of points used to 
generate the corresponding pseudo-waveform. This operation is 
necessary to address the issue of nonuniform point density 
distribution, which could be generated, e.g., by acquiring the 
data in multiple strips with side overlap. 


B. HSeg Algorithm 

The HSeg algorithm combines region growing, which pro- 
duces spatially connected regions, with clustering, which groups 
similar spatially disjoint regions. The algorithm is summarized in 
the following, while we refer the reader to [33] for a detailed 
description of the method. 


Algorithm 2: HSeg algorithm 


Input: image, dissimilarity criterion (DC), parameter S w g ht 
Output: segmentation hierarchy 

1) Initialize segmentation by assigning each pixel to a region 
label. If presegmentation is provided, label each pixel accord- 
ingly; otherwise label each pixel as a separate region. 

2) Calculate DC value dissim_val between all pairs of regions. 

3) Set merging threshold thresh_val equal to the smallest value 
dissim_val between pairs of spatially adjacent regions. A spa- 
tially adjacent region for a given region contains pixels located in 
the neighborhood (e.g., four- or eight-neighborhood) of the 
considered region’s pixels. 
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4) Merge pairs of spatially adjacent regions with dissim_val = 
thresh_val. 

5) If S^ght > 0, merge pairs of nonadjacent regions with 
dissirri-val < S^ght • thresh-val. The parameter 5 wg ht sets 
the relative importance of spatially adjacent regions with respect 
to nonadjacent ones. 

6) Stop if convergence is achieved, otherwise return to Step 2. 


Different measures can be used to compute DCs between 
the regions. Two criteria were considered in this study. We refer 
the reader to [35] for a detailed description of all measures 
implemented in the HSeg software. The first criterion is repre- 
sented by the square root of band sum mean squared error 
(SRBSMSE) and is based on minimizing the increase of the 
mean squared error between the region mean image and 
the original data. The SRBSMSE criterion between the two 
regions i and j is defined as 


SRBSMSE^ = 


TliTlj 

rii + Tij 



(1) 


where rii(nj) is the number of pixels for region i ( j ), u ^ (ujb) is 
the mean value for band b and region i ( j ), and d is the total 
number of bands. 

The second criterion is related to the spectral angle mapper 
(SAM), which is defined as 


SAM^- = arccos 


( d 

uibUjb 

b—1 


\ 


U,; U,- 


J 


( 2 ) 


Merging of spatially nonadjacent regions is computationally 
demanding. A significant improvement can be obtained using the 
RHSeg version of the algorithm, which is based on a recursive 
divide-and-conquer approximation of HSeg and its efficient 
parallel implementation [33]. 

HSeg produces as output a hierarchy of segmentation levels. 
However, for practical applications, a subset of one or several 
segmentation levels must be selected from this hierarchy. In 
particular, in our context, spatial features are extracted from 
multiple levels. A new strategy for selecting multiple levels from 
the segmentation hierarchy is proposed in Section II-C. 


C. Multiple Pruning 

The proposed strategy extracts multiple levels from the seg- 
mentation hierarchy in an unsupervised way by following a 
pruning approach. Pruning consists of removing subtrees of 
the hierarchy that are homogeneous with respect to a defined 
homogeneity criterion. In this way, the final segmentation does 
not represent one of the actual levels of the hierarchy, but 
incorporates regions selected potentially from different levels. 
The strategy can be thought of as automatically performing a 
multi-level analysis, i.e., multiple segmentation maps are ex- 
tracted by pruning the hierarchy in different ways. 

An example of pruning applied to the segmentation hierarchy 
given by the HSeg algorithm is shown in Fig. 2. The segmenta- 
tion hierarchy is shown in Fig. 2(a) (we note that this represents 



(a) (b) (c) 


Fig. 2. Example of HSeg pruning: (a) hierarchical segmentation using HSeg 
algorithm, (b) pruning of the HSeg, and (c) final segmentation map. 


Last level of hierarchy 



Pruning 3 
Pruning 2 


Pruning I 


First level of hierarchy 


Fig. 3. Example of multiple pruning. 


the output of the second step “HSeg segmentation” shown in 
Fig. 1). The top level represents the last iteration of the merging 
process, in which the entire image is composed of a single region 
(in red). Two main characteristics of the HSeg algorithm are 
illustrated in this example: 1) Unlike other HSeg algorithms, 
such as BPT, more than two regions can be merged into a single 
region simultaneously. For example, three different regions 
(yellow, orange, and cyan) at level i are merged into a single 
region (cyan) at level i + 1; 2) Spatially, nonadjacent regions can 
be merged. For example, two nonadjacent regions (red and 
magenta) at level i — 1 are merged into a single region (red) at 
level i. The result of the pruning is represented in Fig. 2(b), in 
which the segmentation hierarchy is cut, in general, at different 
levels. The final segmentation map is obtained by composing all 
the regions, as shown in Fig. 2(c) (this coincides with the result of 
the third step “multiple pruning” represented in Fig. 1). Although 
a single pruning of the hierarchy is effectively shown in Fig. 2, 
multiple pruning is shown in Fig. 3, in which three different 
prunings of the hierarchy are performed. Different prunings are 
associated with different segmentation maps, which characterize 
the image at different levels of detail. 

The pruning process can be represented as follows: Assume 
that M(M > 2) regions R l ~ l {j = 1, . . . , M) that are distinct at 
a generic level i — 1 of the hierarchy are merged into a single 
region R l at level i. The problem consists of determining whether 
a cut of the hierarchy is required between level i — 1 and level i 
for the M considered regions. We propose first to characterize 
each region of the hierarchy in terms of local statistics. In 
particular, we consider second-order statistics such as standard 
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Fig. 4 . Example of standard deviation value associated with the belonging region 
for a considered pixel as a function of the segmentation level. 


TABLE I 

Number of Training and Test Samples for the University of Houston Dataset 


I 




Class 

# training samples 

# test samples 

W\. Grass healthy 

30 

168 

w 2 : Grass stressed 

30 

160 

Wy Grass synthetic 

30 

162 

w 4 : Tree 

30 

158 

wy Soil 

30 

156 

w t \ Water 

30 

152 

£ 07 : Residential 

30 

166 

Commercial 

30 

161 

w$: Road 

30 

163 

£y E0 : Highway 

30 

161 

w (| : Railway 

30 

151 

W\ 2 : Parking lot 1 

30 

162 

W\y Parking lot 2 

30 

154 

W\ 4 : Tennis court 

30 

151 

| Wiy Running track 

30 

157 

Total 

450 

2382 


deviation. Considering a generic pixel of the image, the standard 
deviation value associated with the given region tends to vary 
across the hierarchy, as highlighted in the example shown in 
Fig. 4. The graph shows the standard deviation value as a 
function of the hierarchy level. At the beginning, when the 
considered region is composed only by the selected pixel, the 
standard deviation is equal to zero, whereas the maximum value 
is reached at the end, when the entire image is merged into a 
single region. We define a * 1 2 3 4 5 as the standard deviation value 
associated with the region R\ The value is compared to the 
threshold value T, which represents a measure of homogeneity. 
If o l > T, then the region R! is not homogeneous, so the 
hierarchy is cut between level i — 1 and level i in order to have 
M distinct homogeneous regions R l f l (j = 1, . . . , M). The 
main problem is determining the value of the threshold T. We 
propose to select the threshold in an automatic and adaptive way. 
Adaptation means the threshold T depends on the considered 
region R\ i.e., T = T(R l ). This is done by considering the P 
pixels p\(k = 1, . . . , P) belonging to the region R L and comput- 
ing the standard deviation cr\{k = 1, . . . , P) in a fixed window of 
size [w x , w y ] for each of them. The threshold value T for the 
region R l is computed by applying an operator function on the P 
standard deviation values a k (k = 1, . . . , P) associated with the 
pixels belonging to the given region. Among different operators 
(e.g., minimum, mean, median), the median produced the best 
empirical results for our data. However, calculating the threshold 
value T directly in this way could be misleading. A pixel 
belonging to the spatial boundary between different objects is 
intrinsically characterized by a large standard deviation value if it 
is computed in a window. However, the pixels could belong to a 
very homogeneous object, which could be characterized in the 
segmentation hierarchy by a low standard deviation region. To 
alleviate this problem, for a generic pixel k, we consider as g\ not 
necessarily the standard deviation computed by centering the 
window at the pixel, but the value a k — min ( 6 ^,, a l k ), where d\ is 
a vector of standard deviation values calculated by centering the 
window at the pixel’s neighborhoods (e.g., eight-neighborhood 
was used in our study). We note that the standard deviation is 


naturally defined for single-band images. However, in our 
context, data are composed of several bands, so the standard 
deviation is first computed for each band separately, and then the 
mean value is considered. 

Pruning of the hierarchy depends on the size [ w x , w y ] of the 
window used to compute the standard deviation a and conse- 
quently the homogeneity threshold T. Therefore, the multiple 
prunings can be obtained by varying the window size. Moreover, 
although the strategy has been proposed for pruning the seg- 
mentation hierarchy generated by the HSeg algorithm in con- 
junction with LiDAR data, it can be applied to any other HSeg 
method and data type, such as multi- or hyperspectral images. 

The proposed segmentation hierarchy pruning is summarized 
in the following: 


Algorithm 3: Segmentation hierarchy pruning 


Input: image, segmentation hierarchy, parameter W = [w x , w y \ 
Output: segmentation map 

1) Calculate for each pixel k of the image the standard 
deviation values d\, a\ using a window of size [w x , w y ] centered 
at the pixel and at the pixel’s neighborhoods, respectively. 
Consider 5^ = min((j/ c , a^). 

2) Consider each section of the hierarchy in which M ( M > 2) 
regions R l ~ l {j ~ 1 , . . . , M ) that are distinct at a generic level 
i — 1 are merged into a single region R l at level i. 

3) Calculate for the region R l the standard deviation value cr\ 

4) Calculate for the region R l the homogeneity threshold value 
T = median (a 1 ), where a 1 is the set of standard deviation values 
associated with the pixels belonging to the considered region. 

5) If o l > T, cut the hierarchy between levels i — 1 and i in 
order to have M distinct homogeneous regions 
R l j rl (j = 1 , . . . , M ) in the final segmentation map. 
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Fig. 5. Ground reference for the University of Houston dataset. 





(C) 


Fig. 6. False-color composite of the University of Houston dataset obtained from LiDAR data: (a) pseudo-waveform (RGB: bands 20, 10, and 15), (b) DSM, and 
(c) intensity. 


D. Object-Based Feature Extraction and Classification 

At the end of the pruning strategy, different segmentation 
maps are obtained, which represent the data at different levels of 
detail. These segmentation maps can be used to extract spatial 
features (e.g., textural) from the resulting regions. 

In particular, we consider the occurrence and co-occurrence 
textural statistics [20], [41]. The occurrence statistics, or first-order 
features, are computed directly by considering the original values of 
the data. Among different measures, mean and standard deviation 
gave the best empirical results for the data used in these experi- 
ments. The co-occurrence statistics, or second-order features, are 
based on the grey-level co-occurrence matrix (GLCM). On the 
basis of this matrix, different texture descriptors can be extracted. 
Among the 14 descriptors proposed originally in [41 ], we consider 
six different parameters: homogeneity, contrast, dissimilarity, 


entropy, second moment, and correlation. We note that the textural 
features are extracted independently from each band of the data. 

The last step of the proposed framework consists of generating 
the final classification map. The extracted features are combined 
with the original pseudo- waveforms and classified. We adopt a 
standard S VM classifier, which has shown good performance for 
most remote sensing tasks [42]. We refer the reader to [43] for 
more details about SVM. 

III. Experimental Results and Discussion 
A. Experimental LiDAR and Hyperspectral Data 

The dataset [44] used in this work comprises discrete return 
LiDAR and a hyperspectral image, acquired over the University 
of Houston campus. The LiDAR data were acquired on June 22, 
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Fig. 7. False-color composite of the University of Houston dataset obtained from hyperspectral data (RGB: bands 105, 61, and 40). 






<d) (e) (f) 


Fig. 8. Example of pseudo-waveforms from LiDAR data (in red) and spectral signatures from hyperspectral data (in black) for (a) tree, (b) residential, (c) commercial, 
(d) grass stressed, (e) road, and (f) water classes. 


2012, between the times 14:37:55 and 15:38:10 UTC. The data 
were acquired by Optech Gemini system [45] which has maxi- 
mum pulse repetition rate of 1 67 kHz and can fly up to 4000 m 
above ground level. It can record up to four returns from a return 
pulse. The average height of the sensor at the time of acquisition 
was 2000 ft above ground level, which resulted in average point 
density of 35.38 points/m 2 on the ground. The hyperspectral 
imagery (acquired with the ITRES-CASI 1500 sensor [46]) 
consists of 144 spectral bands in the 380-1050 nm region 
and was processed (radiometric correction, attitude processing, 
GPS processing, geo-correction, etc.) to yield the final geo- 
corrected image cube representing at-sensor spectral radiance, 
SRU = /i W/cm 2 sr nm. Processing software provided by the 
manufacturer (ITRES) was utilized for this postprocessing. The 
hyperspectral data were acquired on June 23, 2012 between 
the times 17:37:10 and 17:39:50 UTC. The average height of 
the sensor above ground was 5500 ft, which resulted in 2.5-m 
spatial resolution data. 

The study area represents an urban scenario, with 1 5 classes 
that are listed in Table I. The ground reference map is shown in 


Fig. 5. The scene includes vegetation classes, as well as classes 
representing common materials in a typical urban analysis task. 
These classes were identified by photo- interpreting a very high 
resolution optical imagery of the scene acquired during the same 
collection. “Grass” was represented as three classes — healthy, 
stressed, and synthetic. Healthy and stressed grass was identified 
through a threshold of the normalized difference vegetation 
index (NDVI); synthetic grass represents artificial grass/turf. 
Parking lots were categorized into two distinct types — 1 and 2 
(one represents empty lots, and two represents lots with cars). 
Other classes in the dataset, as listed in Table I, are self- 
explanatory. From the available ground reference, 30 samples 
per class were selected randomly as training samples. The 
remaining pixels constituted the test set. 

B. Experimental Setup 

LiDAR points were classified into ground and nonground 
points using LASTools [47] with “-metro” option since the 
study area is located in the southeastern part of the metropolitan 
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(b) 

Fig. 9. Result of the proposed pruning strategy on the segmentation hierarchy generated by the HSeg algorithm on the pseudo-waveforms data: (a) segmentation map 
and (b) pruning level map. 


city, Houston. The same grid structure of the hyperspectral 
data with 2.5-m spatial resolution was adopted in generating 
pseudo- waveform data from the discrete return LiDAR data. The 
DTM with the same 2.5-m grid structure was generated using 
only ground points and the NN interpolation algorithm. For a 
given pixel, 80 voxels with the same horizontal dimension as the 
grid (d x = 2.5 m, ^ = 2.5 m) and 1-m vertical resolution 
(i d z = 1 m) were created. The voxels were designated such that 
the 1) bottom, 2) 10th, and 3) top voxel corresponds to the 
LiDAR signal at 1) 9 m below the ground elevation, 2) ground 
elevation, and 3) 70 m above the ground elevation of the 
corresponding pixel location, and each layer of the voxel struc- 
ture can be interpreted as a band that represents LiDAR signal 
response at the corresponding elevation above the ground. The 
DSM and intensity images were also generated from the original 
LiDAR point cloud data to compare the results obtained using 
these data to those obtained using pseudo- waveforms. The DSM 
was generated using the same 2.5-m grid structure by calculating 
the maximum elevation of the points within every pixel. The 
intensity image was generated from the point cloud data by first 
summing all intensity value of points assigned to the pixel and 
then by dividing the sum by the number of points of the pixel. 

LiDAR products and hyperspectral image are shown in 
Figs. 6 and 7, respectively. Moreover, for some specific 
pixels of the images, represented with crosses and related to 
different classes, we show in Fig. 8 the pseudo- waveform and 


the corresponding spectral signature: 1) Tree , 2) Residential , 
3) Commercial , 4) Grass stressed , 5) Road , and 6) Water. These 
examples illustrate that the structural differences represented in 
pseudo- waveforms can be utilized in the classification process. 
We especially note that two classes, i.e., Residential and Com- 
mercial ', are spectrally similar but have different structural 
characteristics. 

The HSeg algorithm was applied to pseudo- waveform, DSM, 
intensity, and hyperspectral data by adopting a four-neighborhood 
connectivity. In terms of DC, the pseudo- waveform and hyper- 
spectral data were processed using the SAM criterion [48] (we 
note that in this way all available bands are considered by the 
algorithm). However, since this criterion is not meaningful for 
single-band images, the SRBSMSE measure was adopted for 
DSM and intensity data. For all cases, the parameter S w ^t, which 
sets the importance of nonadjacent regions in the region growing 
process, was fixed to 0.1. 

The strategy proposed for segmentation hierarchy pruning 
requires setting two main parameters. The number of prunings 
of the hierarchy, i.e., the number of segmentation maps ex- 
tracted from the HSeg result, is an input. For all the cases, we 
considered three different prunings. For each pruning, it is 
necessary to define the parameter W = [w x ,w y \, which is 
related to the size of the window used for computing the 
homogeneity threshold. Windows with size [3, 3], [5, 5], and 
[7, 7] were considered. 
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TABLE II 

OA, AA, Kappa, and Class Accuracies Achieved on the University of Houston Dataset Using LiDAR Data 


Data 

PW 

DSM 

IN 

ORI 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

SP1 

- 

X 

X 

X 

X 

- 

X 

X 

X 

X 

- 

X 

X 

X 

X 

SP2 

- 

- 

X 

- 

X 

- 

- 

X 

- 

X 

- 

- 

X 

- 

X 

W/R 

- 

W 

W 

R 

R 

- 

W 

W 

R 

R 

- 

W 

W 

R 

R 

OA 

55.75 

75.65 

81.36 

84.01 

87.95 

37.78 

50.00 

61.88 

53.57 

70.65 

33.59 

49.20 

61.46 

57.89 

67.63 

AA 

55.93 

75.79 

81.45 

84,04 

88.01 

38.36 

50.37 

61.94 

53.90 

70.79 

33.88 

49.40 

61.55 

58,06 

67,76 

K 

52.61 

73.91 

80.03 

82.86 

87.09 

33.42 

46.46 

59.16 

50,28 

68.56 

28.91 

45.60 

58.71 

54.90 

65.33 


39.88 

63.69 

68,45 

92.86 

85.71 

27.38 

25.00 

85.71 

2 1 .43 

74.40 

45.24 

73.21 

7738 

68.45 

77,38 

® 2 

64,38 

80.63 

79,38 

48.13 

77.50 

0.00 

6.25 

7.50 

9.38 

32.50 

64.38 

71,88 

59.38 

73.13 

70.00 


99.38 

100 

100 

100 

100 

63.58 

72.22 

88.89 

61.73 

80.86 

96.91 

91.98 

82.72 

91.36 

85.19 

®4 

79,75 

89.87 

93,67 

76.58 

72.78 

20,89 

53.80 

88.61 

53,16 

93,04 

0.00 

8,23 

42.41 

21,52 

53.16 

®5 

41.67 

58.33 

58.97 

81.43 

87.82 

16.03 

5.13 

12.18 

19,23 

38,46 

30.13 

32.05 

62.82 

60,26 

69.87 

*>6 

0.00 

48.03 

81,58 

71.71 

80,92 

63.82 

76.97 

59.21 

78.29 

70,39 

0.00 

39.47 

79.61 

45.39 

89.47 


63.86 

86.75 

93.37 

87.35 

83.13 

22.89 

67.47 

78.31 

7831 

86.75 

7.23 

40.36 

58.43 

53.01 

49.40 

® 8 

84.47 

86.34 

90,06 

93.79 

88.20 

82.61 

77.64 

84.47 

80.12 

89,44 

10.56 

13.66 

50.31 

22,36 

57.14 


0.00 

34.97 

64.42 

77.30 

87.12 

0.00 

17.18 

19.02 

34.97 

42.33 

0.00 

30.67 

36.20 

49.69 

51.53 


21.74 

69.57 

64.60 

73.91 

77.64 

0.00 

52.80 

63.35 

62.11 

65.22 

6.21 

34.78 

50.93 

49.07 

58.39 


85.43 

88.74 

85,43 

98.68 

97.35 

76.82 

72.85 

84,77 

72.85 

85,43 

92,05 

92,72 

60.26 

84.77 

72.85 

0>\2 

28.40 

51.85 

70.37 

79.01 

96.91 

0.00 

17.90 

32.10 

24.07 

42.59 

0.00 

11.11 

48.77 

29.01 

64.20 

<a?i3 

68.18 

92.21 

98.70 

82.47 

87.01 

51.30 

58.44 

80.52 

55.84 

81.17 

9.09 

62.34 

72.08 

73,38 

74.68 


82.78 

96.03 

87.42 

98.01 

98,68 

93.38 

90.07 

78.81 

93.38 

90.73 

79.47 

76.16 

63.58 

78.81 

62.91 

G>I5 

78.98 

89.81 

85.35 

99.36 

99.36 

56.69 

61.78 

65.61 

63.69 

88.54 

66.88 

62.42 

78.34 

70.70 

80.25 


The features were normalized in the range [0, 1]. SVM 
classification was accomplished using a Gaussian RBF kernel. 
The SVM hyper-parameters were selected by a fivefold cross 
validation applied on the training set. The C and 7 parameters 
were selected in the range [ 2 ~ 5 , 2 15 ] and [ 2 -15 , 2 3 ], respectively. 
The LIBSVM library [49] was used to solve the SVM classifi- 
cation problem. Classification performances were evaluated by 
different measures: 1) the overall accuracy (OA), 2) the Kappa 
statistic [50], 3) the classification accuracies obtained for indi- 
vidual classes, and 4) the average accuracy (AA). 

The experimental analysis can be subdivided into two main 
parts. In the first part, we compared the classification results 
obtained by considering the proposed framework, in which 
pseudo- waveforms were used, with the results obtained by 
adopting basic LiDAR products, such as DSM and intensity. 
Moreover, the results given by the proposed object-based feature 
extraction were compared to those given by a more traditional 
window-based approach. In this case, for a fair comparison, the 
same window sizes were considered. In the second part, the 
analysis of hyperspectral and LiDAR data was conducted to 
demonstrate complementary of the two data sources for perform- 
ing land cover classification. 

C. Experimental Results and Discussion 

Before presenting the results in terms of classification accura- 
cies, we evaluate the proposed pmning strategy in a qualitative 
way. We show in Fig. 9 the result obtained by applying the pruning 
strategy on the segmentation hierarchy generated by the HSeg 
algorithm on the pseudo- waveform data. In this case, the parameter 
[w x , w y ] necessary for estimating the homogeneity threshold was 
set to [3, 3]. Therefore, the resulting segmentation map (random 
colors have been assigned to different regions), shown in Fig. 9(a), 


represents a finer level of detail of the image. A good segmentation 
result has been obtained, as highlighted by the zoomed parts of the 
image. Moreover, we show in Fig. 9(b), the corresponding 
pmning level map. The color of the map refers to the level in 
which the segmentation hierarchy was cut for the considered 
region. For example, blue and red regions were extracted from 
the first and last levels of the hierarchy, respectively. It is clear how 
the segmentation map was effectively obtained by extracting 
different regions from different levels of the hierarchy. 

In the first part of the experiments, classification accuracies 
based on the proposed framework by adopting pseudo- waveforms 
were compared with the results obtained by considering basic 
LiDAR products, such as DSM and intensity (IN). The results are 
summarized in Table II. In general, the proposed framework 
in conjunction with pseudo-waveform data greatly improved 
the classification accuracies in comparison with the other cases 
(DSM and intensity). When only the original (ORI) data were 
considered, the pseudo- waveform gave OA = 55.75%, and OA 
was equal to 37.78% and 33.59% for the DSM and intensity cases, 
respectively. Higher accuracies were exhibited by pseudo-wa- 
veforms also when spatial features were added to the original 
data. Comparing region-based (R) and window-based (W) 
feature extraction approaches, better results were obtained 
using the proposed approach. For example, by adding the 
first-order textural features (SP1), the OA for the pseudo- 
waveform case was equal to 84.01% and 75.65% for the 
region-based and the window-based approaches, respectively. 
Similar improvements were also verified by considering 
DSM (OA = 53.57% and 50.00%, respectively) and intensity 
(OA = 57.89% and 49.20%, respectively) data. These results 
validated the effectiveness of the proposed region-based feature 
extraction strategy. Similar considerations can also be per- 
formed by adding the second-order textural features (SP2), in 
which a further improvement of the accuracies was verified. 
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TABLE III 

OA, AA, Kappa, and Class Accuracies Achieved on the University of Houston Dataset Using Pseudo-Waveform and Hyperspectral Data 


Data 

PW 

HYP 

ORI 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

SPi 

- 

X 

X 

X 

X 

- 

X 

X 

X 

X 

SP2 

- 

- 

X 

- 

X 


- 

X 

- 

X 

W/R 

- 

W 

W 

R 

R 

- 

W 

W 

R 

R 

OA 

55.75 

75.65 

81.36 

84.01 

87.95 

89.38 

92.49 

92,78 

94,25 

95.17 

AA 

55.93 

75.79 

81.45 

84.04 

88.01 

89.44 

92.56 

92*87 

94.24 

95,22 

K 

52.61 

73.91 

80.03 

82.86 

87.09 

88.62 

91.95 

92,26 

93.84 

94.83 

Q}\ 

39.88 

63.69 

68.45 

92.86 

85.71 

94.05 

92.26 

95,83 

98*21 

98.81 

Q} 2 

64.38 

80.63 

79.38 

48.13 

77.50 

100 

100 

100 

100 

96.88 

a> $ 

99.38 

100 

100 

100 

100 

100 

100 

100 

100 

98.77 

ffl 4 

79.75 

89.87 

93.67 

76.58 

72.78 

100 

100 

100 

100 

100 

Q} 5 

4L67 

58.33 

58.97 

81.41 

87.82 

96.79 

100 

100 

100 

99.36 

a>6 

0.00 

48.03 

81.58 

71.71 

80.92 

98.68 

96.05 

93*42 

96*71 

94.74 

ffl 7 

63.86 

86.75 

93.37 

87.35 

83.13 

86.14 

89.76 

89,16 

99.40 

81.93 


84.47 

86.34 

90.06 

93.79 

88.20 

72.67 

75.78 

75*78 

77.64 

86.34 

£>9 

0.00 

34.97 

64.42 

77.30 

87.12 

77.91 

87.73 

84*66 

90*80 

93.25 

W|0 

2 1 *74 

69.57 

64.60 

73.91 

77.64 

81.99 

77.64 

76*40 

85.71 

93.17 

m u 

85.43 

88.74 

85.43 

98.68 

97.35 

95.36 

95.36 

96,03 

94.70 

94.70 

ton 

28.40 

51.85 

70.37 

79.01 

96.91 

80.25 

91.36 

88*27 

94.44 

98.77 

0> 13 

68.18 

92.21 

98.70 

82.47 

87.01 

59.09 

82.47 

93*51 

76.62 

92.86 

m l4 

82.78 

96.03 

87.42 

98.01 

98.68 

98.68 

100 

100 

99.34 

98.68 

5 

78.98 

89.81 

85.35 

99.36 

99.36 

100 

100 

100 

100 

100 


Considering the pseudo-waveform (DSM and intensity) data, 
OA was equal to 87.95% (70.65% and 67.63%) and 81.36% 
(61.88% and 61.46%) for the region-based and the window- 
based approaches, respectively. The best result (OA = 87.95%) 
was obtained when we considered both first- and second-order 
textural features extracted from pseudo- waveform data by 
following a region-based approach. This indicates that utilizing 
pseudo-waveform data generated from the raw LiDAR point 
cloud has really significantly improved the overall classifica- 
tion accuracy for these data. 

In the second part of the experimental analysis, we investi- 
gated the classification accuracies obtained by applying the 
proposed framework to the hyperspectral data. The results are 
shown in Table III. For hyperspectral data, we observed a 
classification accuracy pattern similar to that described previ- 
ously for LiDAR data. In particular, also for hyperspectral data, 
classification accuracies were improved as we added the textural 
features to the original spectral signatures. Moreover, better 
results were obtained by extracting features by following a 
region-based approach. The best result was verified when both 
first- and second-order features were considered (OA equals to 
95.17% and 87.95% for hyperspectral and pseudo-waveform 
data, respectively). Analyzing in greater detail the class accura- 
cies, six classes (ui, uj§, ujq, ujg, ujiq, and cui 2 ) yielded weak 
classification results when only the original pseudo-waveform 
data were considered (uoi\ 39.88%, 6U5: 41.67%, ujq: 0.00%, cug: 
0.00%, ujiq: 21.74%, and uj i 2 : 28.40%). However, the inclusion 
of textural features improved the accuracies for these weak 
classes (Acui: 45.83%, Ac u 5 : 46.15%, Acu 6 : 80.92%, Acug: 
87.12%, Atui 0 : 55.9%, and Acui 2 : 68.51%). Although the OA 


for the hyperspectral data was higher than that obtained for the 
pseudo- waveform data, when both first- and second-order 
textural features were utilized in addition to the original data, 
three classes (1U7, ujg, and wn) yielded better accuracies for the 
pseudo- waveform case. These observations indicate that these 
classes ( Residential , Commercial , and Railway ) can be better 
separated using the structural information provided by the 
pseudo- waveform data. These observations suggest how a fur- 
ther improvement of the overall classification accuracy can be 
obtained by fusing LiDAR and hyperspectral data. 

IV. Conclusion 

In this work, we have proposed a new framework for land 
cover classification using discrete return LiDAR data. In partic- 
ular, pseudo-waveforms have been generated from the discrete 
return LiDAR data and processed by the HSeg segmentation 
algorithm. A new strategy for multiple pruning of the segmenta- 
tion hierarchy has been proposed in order to extract spatial 
features from homogeneous regions by following a region-based 
approach. Finally, the extracted features have been combined 
with the original pseudo-waveforms data and classified using 
SVM classification. 

Novelties and key points of the proposed framework can be 
summarized as follows: 1) the land cover classification problem 
has been addressed in an innovative way by incorporating 
pseudo- waveforms and object-based spatial analysis; 2) the 
effectiveness of the state-of-the-art HSeg algorithm has been 
validated on LiDAR data, whereas previous research has 
been conducted on multi- and hyperspectral data; 3) the HSeg 
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algorithm has been used as a preprocessing step for object-based 
feature extraction, while it was used for different purposes in the 
previous works (e.g., as postprocessing for spatial regularization 
[33], [36] or as preprocessing for ground reference design [51], 
[52]); and 4) the pruning of the segmentation hierarchy has been 
addressed by a new unsupervised multi-level strategy. 

The proposed framework has been validated experimentally 
on a dataset acquired in an urban area at the University of 
Houston. The experimental results reveal that the proposed 
framework, in conjunction with pseudo- waveform data, has 
given an improvement of the classification accuracies with 
respect to the cases in which basic LiDAR products, such as 
DSM or intensity, have been used. Moreover, the proposed 
region-based feature extraction process has been able to give 
better results in comparison with a more traditional window- 
based approach. Finally, although the best accuracies have been 
obtained, in general, by considering the hyperspectral data, the 
use of pseudo- waveforms has shown a promising performance. 
In particular, some classes have been classified better using 
LiDAR data than with the hyperspectral data. This suggests 
classification accuracies may be further improved by fusing the 
LiDAR and hyperspectral data. Future research will be con- 
ducted to fully exploit synergistic effects between the LiDAR 
and hyperspectral data for performing land cover classification. 
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